Training the YOLOv8 Object Detector for OAK

您所在的位置:网站首页 face recognition using opencv and python a beginners Training the YOLOv8 Object Detector for OAK

Training the YOLOv8 Object Detector for OAK

2023-05-05 06:07| 来源: 网络整理| 查看: 265

Click here to download the source code to this post Home » Blog » Training the YOLOv8 Object Detector for OAK-D Table of Contents Training the YOLOv8 Object Detector for OAK-D Introduction A Primer on YOLOv8 Configuring Your Development Environment Need Help Configuring Your Development Environment? About the Dataset YOLOv8 Label Format Hand Gesture Recognition Dataset YOLOv8 Training Selecting the Model Downloading the Hand Gesture Recognition Dataset Configuration Setup Understanding YOLOv8 Command Line Interface (CLI) Training the YOLOv8n Model Visualizing Model Artifacts Evaluating YOLOv8n on the Test Dataset Training the YOLOv8s Model Evaluating YOLOv8s on the Test Dataset Summary Citation Information Training the YOLOv8 Object Detector for OAK-D

In this tutorial, you will learn to train a YOLOv8 object detector to recognize hand gestures in the PyTorch framework using the Ultralytics repository by utilizing the Hand Gesture Recognition Computer Vision Project dataset hosted on Roboflow. The goal would be to train a YOLOv8 variant that can learn to recognize 1 of 5 hand gestures (e.g., one, two, three, four, and five) with good mean average precision (mAP). Furthermore, since this tutorial acts as a strong base for an upcoming tutorial, the trained YOLOv8 variant should be able to run inference in near real-time on the OpenCV AI Kit (OAK) that comes powered with the Intel MyriadX neural hardware accelerator.

This lesson is the first in our series on OAK 102:

Training the YOLOv8 Object Detector for OAK-D (this tutorial)Gesture Recognition with YOLOv8 on OAK-D in Near Real-Time

To learn how to train a YOLOv8 object detector on a hand gesture dataset for OAK-D, just keep reading.

Looking for the source code to this post? Jump Right To The Downloads Section Training the YOLOv8 Object Detector for OAK-D Introduction

Object detection is one of the most exciting problems in the computer vision domain. The progress in this domain has been significant; every year, the research community achieves a new state-of-the-art benchmark. And, of course, all of this wouldn’t have been possible without the power of Deep Neural Networks (DNNs) and the massive computation by NVIDIA GPUs.

It all started when Redmon et al. (2016) published the YOLO research community gem, “You Only Look Once: Unified, Real-Time Object Detection,” at the CVPR (Computer Vision and Pattern Recognition) Conference. YOLO, or YOLOv1, was the first single-stage object detection model. It quickly gained popularity due to its high speed and accuracy.

The authors continued from there. Redmon and Farhadi (2017) published YOLOv2 at the CVPR Conference and improved the original model by incorporating batch normalization, anchor boxes, and dimension clusters.

And then came the YOLO model wave. In 2023, we arrived at Ultralytics YOLOv8. Yes, you read it right! From the day YOLOv1 was out, a new version of YOLO was published every year with improvements in both speed and accuracy.

Today, YOLO is the go-to object detection model in the computer vision community since it is the most practical object detector focusing on speed and accuracy.

Figure 1 shows the progression in YOLO models from YOLOv1 to PP-YOLOv2. One interesting aspect in the figure is the YOLOv5 model by Ultralytics, published in the year 2020, and this year, they released yet another state-of-the-art object detection model, YOLOv8. And today’s tutorial is all about experimenting with YOLOv8 but for OAK-D.

Figure 1: History of YOLO (source: Introduction to the YOLO Family).Figure 1: History of YOLO (source: Introduction to the YOLO Family).

If you would like to learn about the entire history of the YOLO family, we highly recommend you check out our series on YOLO!

A Primer on YOLOv8

YOLOv8 is the latest version of the YOLO object detection, classification, and segmentation model developed by Ultralytics. While writing this tutorial, YOLOv8 is a state-of-the-art, cutting-edge model. Like previous versions built and improved upon the predecessor YOLO models, YOLOv8 also builds upon previous YOLO versions’ success. The new features and improvements in YOLOv8 boost performance and accuracy, making it the most practical object detection model.

One key feature of YOLOv8 is its extensibility. It is designed as a framework that supports all previous versions of YOLO, making it easy to switch between versions and benchmark their performance. This makes YOLOv8 an ideal choice for users who want to take advantage of the latest YOLO technology while still being able to use their existing YOLO models.

Table 1 shows the performance (mAP) and speed (frames per second (FPS)) benchmarks of five YOLOv8 variants on the MS COCO (Microsoft Common Objects in Context) validation dataset at 640×640 image resolution on Ampere 100 GPU. All five models were trained on the MS COCO training dataset. The model benchmarks are shown in ascending order starting with YOLOv8n (i.e., the nano variant having the smallest model footprint to the largest model, YOLOv8x). We would be training the Nano and Small variant of YOLOv8 as it would fit well into the OAK’s computer power.

Table 1: Performance and Speed benchmarks of five YOLOv8 variants on the MS COCO dataset.

The innovation is not just limited to YOLOv8’s extensibility. Some more prominent innovations that directly relate to its performance and accuracy include

a new backbone networka new anchor-free detection head a new loss function

YOLOv8 is also highly efficient and can run on various hardware platforms, from CPUs to GPUs to Embedded Devices like OAK. And as you already know, our goal is to run YOLOv8 on an embedded hardware platform (i.e., an OAK edge device).

Figure 2 compares YOLOv8 with previous YOLO versions: YOLOv7, YOLOv6, and Ultralytics YOLOv5. The comparison is made in two fashions: mAP vs. model parameters and mAP vs. Latency measured on A100 GPU. The figure shows that almost all the YOLOv8 variants achieve the highest mAP on the COCO validation dataset. Also, YOLOv8 has fewer model parameters and less Latency benchmarked on the NVIDIA Ampere 100 architecture.

Figure 2: Comparison of YOLOv8 with previous YOLO variants in terms of mAP vs. Model Parameters (left) and mAP vs. Latency on A100 GPU (right) (source: https://github.com/ultralytics/ultralytics).Figure 2: Comparison of YOLOv8 with previous YOLO variants in terms of mAP vs. Model Parameters (left) and mAP vs. Latency on A100 GPU (right) (source: https://github.com/ultralytics/ultralytics).

Overall, YOLOv8 is hands down a powerful and flexible framework for object detection offered in PyTorch.

This tutorial is the first in our OAK-102 series, and we hope you have followed the series of tutorials in our OAK-101 series. If not, we highly recommend you check out the OAK-101 series, which will build a strong foundation for the OpenCV AI Kit. You will learn the OAK hardware and the software stack from the ground level, and not just that. For example, you would learn to train and deploy an image classification TensorFlow model on an OAK edge device.

This tutorial will cover more advanced Computer Vision applications and how to deploy these advanced applications onto the OAK edge device.

Now, let’s start with today’s tutorial and learn to train the hand gesture recognition model for OAK!

Configuring Your Development Environment

To follow this guide, you need to clone the Ultralytics repository and pip install all the necessary packages via the setup and requirements files.

Luckily, to run the YOLOv8 training, you can do a pip install on the ultralytics cloned folder, meaning all the libraries are pip-installable!

One good news is that YOLOv8 has a command line interface, so you do not need to run Python training and testing scripts. With just the yolo command, you get most functionalities like modes, tasks, etc. Do not worry; today’s tutorial will cover the important command line arguments!

$ git clone https://github.com/ultralytics/ultralytics $ pip install ultralytics Need Help Configuring Your Development Environment? PyImageSearch UniversityFigure 3: Need help configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in minutes.

All that said, are you:

Short on time?Learning on your employer’s administratively locked system?Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?Ready to run the code immediately on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

About the Dataset

For today’s experiment, we will train the YOLOv8 model on the Hand Gesture Recognition Computer Vision Project dataset hosted on Roboflow.

These datasets are public, but we download them from Roboflow, which provides a great platform to train your models with various datasets in the Computer Vision domain. Even more interesting is that you can download the datasets in multiple formats like COCO JSON, YOLO Darknet TXT, and YOLOv8 PyTorch. This process saves time for writing helper functions to convert the ground-truth annotations to the format required by these object detection models.

YOLOv8 Label Format

Since we will train the YOLOv8 PyTorch model, we will download the dataset in YOLOv8 format. The ground-truth annotation format of YOLOv8 is the same as other YOLO formats (see Figure 4), so you could write a script on your own that does this for you. There is one text file with a single line for each bounding box for each image. For example, if four objects exist in one image, the text file would have four rows containing the class label and bounding box coordinates. The format of each row is

class_id center_x center_y width height

where fields are space-delimited, and the coordinates are normalized from 0 to 1. To convert to normalized xywh from pixel values:

divide x and the box width by the image’s width divide y and the box height by the image’s height Figure 4: YOLOv8 bounding box format example (source: https://roboflow.com/formats/yolov5-pytorch-txt).Figure 4: YOLOv8 bounding box format example (source: https://roboflow.com/formats/yolov5-pytorch-txt). Hand Gesture Recognition Dataset

This dataset contains 839 images of 5 hand gesture classes for object detection: one, two, three, four, and five. With the help of five fingers, one- to five-digit combinations are formed, and the object detection model is trained on these hand gestures with respective labels, as shown in Figure 5. The dataset is split into training, validation, and testing sets. The dataset comprises 587 training, 167 validation, and 85 testing images. Each image has a 416×416 resolution with only one object (or instance).

Figure 5 shows sample images from the dataset with ground-truth bounding boxes annotated in red, belonging to classes four, five, two, and three.

Figure 5: Sample images from the Hand Gesture Recognition Dataset with ground-truth annotations (source: image by the author).Figure 5: Sample images from the Hand Gesture Recognition Dataset with ground-truth annotations (source: image by the author).

Since only one object (gesture or class) is present in each image, there are 587 regions of interest (objects) in 587 training images, meaning there is precisely one object per image. Based on the heuristic shown in Figure 6, class five contributes to more than 45% of the objects. In contrast, the remaining classes: one, two, three, and four, are under-represented relative to gesture class five.

Figure 6: Class Distribution of Hand Gesture Dataset showing that more than 45% of the objects belong to hand gesture `five` (source: image by the author).Figure 6: Class Distribution of Hand Gesture Dataset showing that more than 45% of the objects belong to hand gesture five (source: image by the author).

The Python code for data visualization (Figure 5) and class distribution graph (Figure 6) computation is provided inside the Google Colab Notebook of this tutorial!

YOLOv8 Training

This section is the heart of today’s tutorial, where we will cover most of the tasks, including

Selecting the modelDownloading the datasetCreating the data configurationUnderstanding the YOLOv8 command line interfaceTraining the YOLOv8 nano modelVisualizing the YOLOv8 nano model artifactsQualitative and quantitative evaluation of testing dataTraining the YOLOv8 small modelEvaluating the YOLOv8 small variant on testing data Selecting the Model

Figure 7 shows 5 YOLOv8 variants starting with the most miniature YOLOv8 nano model built for running on mobile and embedded devices to the YOLOv8 XLarge on the other end of the spectrum. For today’s experiment, we will work with mainly two variants: Nano and Small. We chose these two variants because our final goal is to run the YOLOv8 model on an OAK-D device that can recognize hand gestures. The figure shows that the Nano and Small model variants have smaller memory footprints than higher-end variants.

Figure 7: YOLOv8 variants starting with YOLOv8 Nano to YOLOv8 XLarge (source: image by the author).Figure 7: YOLOv8 variants starting with YOLOv8 Nano to YOLOv8 XLarge (source: image by the author).

OAK-D, an embedded device, has computation constraints, which doesn’t mean that other higher-end variants like Medium and Large won’t work on OAK-D, but the performance (FPS) would be lesser. Hence, we choose Nano and Small as they balance accuracy and performance well.

One more observation from Figure 7 is that the mAP improvements from Medium to XLarge are minute. However, the algorithm processing time increases significantly, which would pose a problem for deploying these models on OAK devices.

Downloading the Hand Gesture Recognition Dataset # Download the vehicles-open image dataset !mkdir hand_gesture_dataset %cd hand_gesture_dataset !curl -L -s "https://universe.roboflow.com/ds/zafYqbWHn8?key=n1igBaphSm" > hand_gesture.zip !unzip -q hand_gesture.zip !rm hand_gesture.zip

On Lines 2 and 3, we create the hand_gesture_dataset directory and cd into the directory where we download the dataset. Then, on Line 4, we use the curl command and pass the dataset URL we obtained from the Hand Gesture Recognition Computer Vision Project. Finally, we unzip the dataset and remove the zip file on Lines 5 and 6.

Let’s look at the contents of the hand_gesture_dataset folder:

$tree /content/hand_gesture_dataset -L 2 /content/hand_gesture_dataset ├── data.yaml ├── README.dataset.txt ├── README.roboflow.txt ├── test │ ├── images │ └── labels ├── train │ ├── images │ └── labels └── valid ├── images └── labels 9 directories, 3 files

The parent directory has 3 files, out of which only data.yaml is essential, and 3 subdirectories:

data.yaml: Has the data-related configurations, such as the train and valid data directory paththe total number of classes in the datasetthe name of each classtrain: Training images along with training labelsvalid: Validation images with annotationstest: Test images and labels Configuration Setup

Next, we will edit the data.yaml file to have the path and absolute path for the train and valid images.

# Create configuration config = { "path": "/content/hand_gesture_dataset", "train": "train", "val": "valid", "test": "test", "nc": 5, "names": ['five', 'four', 'one', 'three', 'two'] } with open("hand_gesture_dataset/data.yaml", "w") as file: yaml.dump(config, file, default_flow_style=False)

From Lines 3-7, we define the data path, train, validation, test, number of classes, and class names in a config dictionary.

Finally, on Lines 12 and 13, we:

open the existing data.yaml file that was downloaded along with the datasetoverwrite it with the contents in configstore it on the disk Understanding the YOLOv8 Command Line Interface

The good news is that YOLOv8 also comes with a command line interface (CLI) and Python scripts, making training, testing, and exporting the models much more straightforward. In addition, the YOLOv8 CLI allows for simple single-line commands without needing a Python environment. For example, as shown in the shell blocks below, all tasks related to the YOLO model can be run from the terminal using the yolo command.

!yolo TASK MODE ARGS

Please note in the above command line that TASK, MODE, and ARGS are just placeholders you will need to replace with actual values, which we discuss next.

TASK is an optional parameter; if not passed, YOLOv8 will determine the task from the model type, which means it’s intelligently designed. The TASK can be detect, segment, or classify.

MODE is a required parameter that can be either train, val, predict, export, track, or benchmark. This parameter helps tell YOLOv8 whether you want to use it for

training the model on a custom dataset validating a trained model making predictions with the trained weights on images/videos converting or exporting the trained model to a format that can be deployedtraining a YOLOv8 detection or segmentation model for use in conjunction with tracking algorithms like BoT-SORT or ByteTrack to perform object tracking on video streamsbenchmarking the YOLOv8 exports such as TensorRT for speed and accuracy (for example, see Table 1)

Finally, ARGS is an optional parameter with various custom configuration settings used during training, validation/testing, prediction, exporting, and all the YOLOv8 hyperparameters. Examples of ARGS can be image size, batch size, learning rate, etc. To learn more about all the available configurations, check out the default.yaml file in the Ultralytics repository.

In short, the YOLOv8 CLI is a powerful tool that allows you to operate YOLOv8 at the tip of your fingers by providing features such as

model trainingmodel validation and testingexporting a trained model to various formats10-15 types of data augmentations training logs model checkpoints mAP and loss plots file management

Let’s look at a few examples of how YOLOv8 CLI can be leveraged to train, predict, and export the trained model.

Fine-tune a pretrained YOLOv8 nano detection model for 20 epochs with an initial learning_rate of 0.01. !yolo train data=coco128.yaml model=yolov8n.pt epochs=20 lr0=0.01 Predict a YouTube video using a pretrained YOLOv8 nano segmentation model at image size 320×320. !yolo predict model=yolov8n-seg.pt source='https://youtu.be/Zgi9g1ksQHc' imgsz=320 Export a YOLOv8n classification model to ONNX (Open Neural Network Exchange) format at image size 224×224. !yolo export model=yolov8n-cls.pt format=onnx imgsz=224,224

Voila! Isn’t that surprising? How easy it was to perform training, prediction, and even model conversion in just one single command.

Training the YOLOv8n Model

Alright! We are almost ready to train the YOLOv8 nano and small object detection model. However, before we run the training, let’s understand a few parameters that we will use while training:

We define a few standard model parameters:

imgsz: Image size or network input while training. The images will be resized to this value before being fed to the network. The preprocessing pipeline will resize them to 416 pixels.data: Path to the data .yaml file, which has training, validation, and testing data paths and class label information.batch: Number of images fed as a single batch into the network for a forward pass. You can modify it according to the GPU memory available. We have set it to 32.epochs: Number of times we want to train the model on the entire hand gesture training dataset. We will train the model for 20 epochs.model: Path to the base model we want to use for training. We use the nano model yolov8n from the YOLOv8 family.project: This will create a project directory inside the working directory (gesture_train_logs).name: Each time you run this model, it will create a subdirectory yolov8n under the project directory, which would have a lot of information on the model (e.g., weights, sample input images, a few validation predictions outputs, metrics plot, etc.). !yolo train model=yolov8n.pt data=hand_gesture_dataset/data.yaml epochs=20 imgsz=416 \ batch=32 project=gesture_train_logs name=yolov8n device=0

The training will start if there are no errors, as shown below. The logs indicate that the YOLOv8 model would train with Torch version 1.13.1 on a Tesla T4 GPU, showing initialized hyperparameters.

The yolov8n.pt weights are downloaded, which means the YOLOv8n model is initialized with the parameters trained with the MS COCO dataset. Finally, we can see that two epochs have been completed with a [email protected]=0.238.

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to yolov8n.pt... 100% 6.23M/6.23M [00:00


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3